Goto

Collaborating Authors

 North Miami


RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Bi, Baolong, Liu, Shenghua, Ren, Xingzhang, Liu, Dayiheng, Lin, Junyang, Wang, Yiwei, Mei, Lingrui, Fang, Junfeng, Guo, Jiafeng, Cheng, Xueqi

arXiv.org Artificial Intelligence

The foundational capabilities of large language models (LLMs) are deeply influenced by the quality of their pre-training corpora. However, enhancing data quality at scale remains a significant challenge, primarily due to the trade-off between refinement effectiveness and processing efficiency. While rule-based filtering remains the dominant paradigm, it typically operates at the document level and lacks the granularity needed to refine specific content within documents. Inspired by emerging work such as ProX, we propose $\textbf{RefineX}$, a novel framework for large-scale, surgical refinement of pre-training data through programmatic editing tasks. RefineX enables efficient and fine-grained data refinement while reliably preserving the diversity and naturalness of raw text. The core strength of RefineX lies in distilling high-quality, expert-guided end-to-end refinement results into minimal edit-based deletion programs. This high-precision distillation pipeline is used to train an efficient and reliable refine model that can systematically improve every instance in the corpus at scale. We evaluate RefineX across from-scratch pre-training at multiple model scales and find that it consistently outperforms models trained on raw, filtered, or alternatively refined data across diverse downstream tasks. On the 750M model, RefineX yields 2.6%-7.2% average gains on lighteval tasks, and achieves comparable performance using significantly fewer training tokens. Further analysis shows that RefineX reliably enhances text quality with both high efficiency and precision, outperforming prior approaches such as end-to-end generation and Prox-C. These results position RefineX as a scalable, effective, and reliable solution for optimizing pre-training data in modern LLM pipelines.


A new shopping companion @Macys & @IBMWatson

#artificialintelligence

NEW YORK, NY - 20 Jul 2016: Today, Macy's announced the pilot of "Macy's On Call," a mobile web tool that allows customers to interact with an AI-powered platform, via their mobile devices. "Macy's On Call" taps IBM (NYSE: IBM) Watson, via Satisfi, an intelligent engagement platform, to deliver a first-of-its-kind solution that will enhance the customer in-store shopping experience at 10 test locations nationwide. A Macy's team member tests out Macy's On Call, a new mobile web tool powered by IBM Watson and Satisfi. Macy's On Call allows customers to input questions in natural language about each participating store's unique product assortment, services and facilities and then receive a customized response to the inquiry. Macy's is currently piloting the new tool in 10 store locations across the country.


Macy's Tests Artificial Intelligence Tool To Improve Service CRM Daily

#artificialintelligence

The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson -- the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers. "We want to improve the shopping experience. We want the customers to shop at Macy's and come back," Serena Potter, Macy's group vice president of digital media strategy told The Associated Press.


Macy's Tests Artificial Intelligence Tool to Improve Service

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in -- like where a particular brand is located or what's in stock -- that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson -- the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.


Macy's tests artificial intelligence tool to improve service

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in -- like where a particular brand is located or what's in stock -- that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson -- the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.


Macy's tests artificial intelligence tool to improve service

Daily Mail - Science & tech

Macy's has revealed it has developed an AI app with IBM's Watson to guide shoppers around its stores. It allows customers to get answers customized to the store they're in like where a particular brand is located or what's in stock - that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a'mobile companion,' can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. Macy's has revealed it has developed an AI app with IBM's Watson to guide shoppers around its stores. It allows customers to get answers customized to the store they're in .


Macy's tests artificial intelligence tool to improve service

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in -- like where a particular brand is located or what's in stock -- that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson -- the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.


Macy's tests artificial intelligence tool to improve service

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in -- like where a particular brand is located or what's in stock -- that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson -- the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.


Macy's test artificial intelligence tool to improve service - The Tropixs

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in – like where a particular brand is located or what's in stock – that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson – the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.


Macy's tests artificial intelligence tool to improve service

#artificialintelligence

Macy's is testing a mobile tool using artificial intelligence that lets shoppers get answers customized to the store they're in--like where a particular brand is located or what's in stock--that they would normally ask a sales associate face-to-face. The tool, which the nation's largest department store chain calls a "mobile companion," can be accessed for now through a browser and will accept questions in 10 U.S. locations about products, services and facilities. It uses natural language and offers feedback in seconds. It's developed by IBM Watson--the Jeopardy-winning "cognitive computing" service and is designed to keep learning more about the store's customers. That's a key element as Macy's seeks to spur sluggish sales, make being at the store more enjoyable and distinguish itself from online portals and specialty retailers.